AITopics | stochastic gradient method

Collaborating Authors

stochastic gradient method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimal Learning for Multi-pass Stochastic Gradient Methods

Junhong Lin, Lorenzo Rosasco

Neural Information Processing SystemsApr-22-2026, 14:34:43 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, sgm, (14 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.52)

Add feedback

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters

Zeyuan Allen-Zhu, Yang Yuan, Karthik Sridharan

Neural Information Processing SystemsMar-23-2026, 08:18:05 GMT

The amount of data available in the world is growing faster than our ability to deal with it. However, if we take advantage of the internal structure, data may become much smaller for machine learning purposes. In this paper we focus on one of the fundamental machine learning tasks, empirical risk minimization (ERM), and provide faster algorithms with the help from the clustering structure of the data. We introduce a simple notion of raw clustering that can be efficiently computed from the data, and propose two algorithms based on clustering information. Our accelerated algorithm ClusterACDM is built on a novel Haar transformation applied to the dual space of the ERM problem, and our variance-reduction based algorithm ClusterSVRG introduces a new gradient estimator using clustering. Our algorithms outperform their classical counterparts ACDM and SVRG respectively.

artificial intelligence, machine learning, vector, (14 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences

Hongseok Namkoong, John C. Duchi

Neural Information Processing SystemsMar-23-2026, 07:48:11 GMT

We develop efficient solution methods for a robust empirical risk minimization problem designed to give calibrated confidence intervals on performance and provide optimal tradeoffs between bias and variance. Our methods apply to distributionally robust optimization problems proposed by Ben-Tal et al., which put more weight on observations inducing high loss via a worst-case approach over a non-parametric uncertainty set on the underlying data distribution. Our algorithm solves the resulting minimax problems with nearly the same computational cost of stochastic gradient descent through the use of several carefully designed data structures. For a sample of size n, the per-iteration cost of our method scales as O(logn), which allows us to give optimality certificates that distributionally robust optimization provides at little extra cost compared to empirical risk minimization and stochastic gradient methods.

algorithm 1, artificial intelligence, machine learning, (13 more...)

Neural Information Processing Systems

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Gradient Methods for Submodular Maximization

Neural Information Processing SystemsMar-17-2026, 13:03:14 GMT

In this paper, we study the problem of maximizing continuous submodular functions that naturally arise in many learning applications such as those involving utility functions in active learning and sensing, matrix approximations and network inference. Despite the apparent lack of convexity in such functions, we prove that stochastic projected gradient methods can provide strong approximation guarantees for maximizing continuous submodular functions with convex constraints. More specifically, we prove that for monotone continuous DR-submodular functions, all fixed points of projected gradient ascent provide a factor $1/2$ approximation to the global maxima. We also study stochastic gradient methods and show that after $\mathcal{O}(1/\epsilon^2)$ iterations these methods reach solutions which achieve in expectation objective values exceeding $(\frac{\text{OPT}}{2}-\epsilon)$. An immediate application of our results is to maximize submodular functions that are defined stochastically, i.e. the submodular function is defined as an expectation over a family of submodular functions with an unknown distribution. We will show how stochastic gradient methods are naturally well-suited for this setting, leading to a factor $1/2$ approximation when the function is monotone. In particular, it allows us to approximately maximize discrete, monotone submodular optimization problems via projected gradient ascent on a continuous relaxation, directly connecting the discrete and continuous domains. Finally, experiments on real data demonstrate that our projected gradient methods consistently achieve the best utility compared to other continuous baselines while remaining competitive in terms of computational effort.

artificial intelligence, machine learning, submodular function, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimal Learning for Multi-pass Stochastic Gradient Methods

Neural Information Processing SystemsMar-17-2026, 12:08:18 GMT

We analyze the learning properties of the stochastic gradient method when multiple passes over the data and mini-batches are allowed. In particular, we consider the square loss and show that for a universal step-size choice, the number of passes acts as a regularization parameter, and optimal finite sample bounds can be achieved by early-stopping. Moreover, we show that larger step-sizes are allowed when considering mini-batches. Our analysis is based on a unifying approach, encompassing both batch and stochastic gradient methods as special cases.

artificial intelligence, machine learning, proceedings, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.92)

Add feedback

Stochastic Chebyshev Gradient Descent for Spectral Optimization

Neural Information Processing SystemsMar-16-2026, 23:01:45 GMT

A large class of machine learning techniques requires the solution of optimization problems involving spectral functions of parametric matrices, e.g.

artificial intelligence, machine learning, spectral function, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.39)

Add feedback

Learning with SGD and Random Features

Luigi Carratino, Alessandro Rudi, Lorenzo Rosasco

Neural Information Processing SystemsFeb-19-2026, 20:31:07 GMT

Neural Information Processing Systems http://nips.cc/

complexity, iteration, random feature, (10 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Liguria > Genoa (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

MinibatchStochasticApproximateProximalPoint Methods

Neural Information Processing SystemsFeb-11-2026, 04:58:37 GMT

We corroborate our theoretical results with extensive empirical testing, which demonstrates the gains provided by accurate modelingandminibatching.

artificial intelligence, gradient method, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

2 Frameworkandassumptions 2.1 Stochasticoptimizationundertimedrift ThroughoutSections2-4,weconsiderthesequenceofstochasticoptimizationproblems min

Neural Information Processing SystemsFeb-9-2026, 00:44:29 GMT

Our results concisely explain the interplay between the learning rate, the noise variance in the gradient oracle, and the strength ofthetime drift. The high-probability results merely assume that thegradient noise and time drift have light tails. Moreover, none of the results require the objectives to have bounded domains.

artificial intelligence, exp, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback